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ABSTRACT 

This paper presents a method to obtain temporally predicted pictures with a quality that 
actually is achievable only when K263 video encoders can apply the negotiable option 
called Advanced Prediction Mode, Furthermore we expect that our method will probably 
decrease both the biirate and the computational complexity. Our proposal aims to achieve 
the same advantages of Advanced Prediction Mode, even when this last one is not used in 
H.263 low bitrate video encoding, Tlie solution is fully R 263 standard compatible but not 
yet standardized^ It can be used at CIF, QCIF and SQCIF resolution. 

1 INTRODUCTION 


The H.263 standard for low bitrate video-conferencing [l]-[2] is based on a video com- 
pression procedure which exploits the high degree of spatial and temporal correlation in 
natural video sequences. The hybrid DPCM/DCT coding removes temporal redundancy 
using inter-frame motion compensation. The residual error images are further processed 
by blocl: Discrete Cosine Transform (DCT), which reduces spatial redundancy by de- 
correlating the pbcels within a block and concentrating the energy of the bloclc itself into 
a few low order coefficients. The DOT coefficients are then quantized according to a fixed 
quantization matrix that is scaled by a Scalar Quantization factor (SQ). Finally, Variable 
Length Coding (^TC) achieve? liigh encoding efficiency and produces a bitstream, which 
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IS transmitted over ISDN (digita]) or PSTN (analogue) chaomela, at constant bitrates. 
Due to the intrinsic structnre of H.263, the final bitstream is produced at variable bitrate 
hence it has to be transformed to constant bitrate by the insertion of an output buffer 
which acts as feedback controUer. The buffer controUer has to achieve a target bitrate 
^^th conSLStent vi.sual quality, low delay and low complexity It monitors the amount 
of bits produced and dynamically adjusts the quantization parameters, accordin<r to its 
fullness status and to the image complexit)'. ° 

For videoconferencing on ISDN lines, H.263 can achieve low bitrates in the range of 64- 
256 kbps with picture formats such as GIF (352 pbcels per 288 Hnes) and QCIF (176 pixels 
per U4 lines), depending on the scene complexity. Resulting from the compatibiUty with 
the first histoncaUy videoconferencing standard, H.261 [3], bitrates greater thaji 256 kbps 
are also possible. The maximum bitrate for videophone systems on PSTN lines is 20 kbps 
achievable only with QCIF and SQCIF (12S pixels per 96 lines) format pictures and actual 
modems limited at 28.8 kbps. 

The H.263 coding standard defines the techniques to be used and the syntax of the 
bitstream. There axe some degrees of freedom in the design of the encoder. The standard 
puts no constraints about important processing stages such as motion estimation, adaptive 
scalar quantization, and bit-rate control. 

In this paper we introduce a new method for H.263 low bitrate video encoders 
and decoders, to achieve a high quality temporally predicted pictures. Similar 
quality is actually achievable only in H.263 terminals that make use of the so 
caUed Advanced Prediction Mode, one of the four H.263 negotiable options. 
The method is a proper motion vectors post-filtering (MVPF), to be applied 
between the motion estimation and motion compensation in the encoding ter- 
minal, or before the motion compensation in the decoding terminal. Even 
if it should be quite independent on the motion estimation strategy, we will 
present it jointly to our new motion estimator, since we think that the best 
performances -will be achieved through the joint action of the MVPF with the 
new motion estimator. The main advantages of our proposal are the following: 
1) no perceptible degi-adation of the final unage quality,' when looking at real 
time videoconferencing sequences, 2) increase of the transmission channel ca- 
pacity, 3) decrease of the computational complexity of the motion estimation 
stage. The proposal can be used at GIF, QCIF and SQCIF resolution. 
The organisation of this paper is as foUows: Section 2 and 3 summarise the main features 
of H.263 standard and Advanced Prediction Mode, respectively In Section 4 and 5 we 
introduce the new motion estimation and the motion vectors post-filterin'^. Section 6 
presents some experiments with the H.263 standard. Finally, some conclusions are drawn ' 
in Section 7 and the invention claims are reported on Section S. 
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2 OVERVIEW OF H.263 STANDARD 

As shown in Fig. 1, the H.263 video compression is based on an inter-frame DPCM/DCT 
encoding loop: there is a motion compensated prediction from a previous image to the 
current one and the prediction eiTor is DCT encoded. At least one frame is a reference 
frame, encoded without temporal prediction. Hence the bcisic H.263 standard has two 
tjTpes of pictures: I-pictures that are strictl}' intia-fi^ame encoded and P-pictures that are 
temporally predicted from earlier frames. 
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Figure 1: Basic DPCM/DCT video compression block diagram. 

The H.263 standard-defines a hierarchical bit-stream syntax. There are four hierai-chy 
layers: Picture level Group Of Blocks level (GOB), macroblock level (MB) and block 
level. This last is the elementary unit over which DCT operates^ it consists of 8 • 8 pixels. 
A macroblock is composed ^by four luminance (Y) bloclcs. covering a 16 • 16 area in a 
picture, and two chrominance bloclcs (U and V), due to the lower chrominance resolution 
(see Fig. 2). 

The basic H.263 motion estimation and compensation stages operate on macroblodcs. 
The coarseness of quantization is defined by a quantization paxameter for the first three 
layers and a fixed quantization matrix which sets the relative coarseness of quantization 
for each coeincient. Frame skipping is also used as a necessary way to reduce the bitrate 
while keeping an acceptable picture qualit}'. As the number of skipped frames is normally 
variable and depends on the output buffer fullness, the buffer regulation should be related 
in some way to frame skipping and quantizer step size variations. 

A set of Pour Negotiable Options can further improve the performance of H.263 in compar- 
ison to H.261. They are named respectively Advanced Prediction Mode (APM), Unres- 
tricted Motion Vectors (UMV), S\Titax based Aiithmetic Coding (SAC) and PB-Prames 
(PBF). The fiiat option, APM, performs motion estimation and temporal prediction on 
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Figure 2: The MB stnictrure. 


four 8 • 8 bloclijs irustead of only one 16 • 16 macroblock. The second option, UMV, reduces 
artefacts on the image boundaries since the motion vectors are allowed to point outside 
the coded picture area. The third option, SAG, can reduce the final bit-rate better then 
traditional Huffman VLC, but it needs more computational power. Pinallyj PB-Prames 
are bi-directionally predicted coded pictures always belonging to a pair of frames; this 
coding constraint allows to exploit bi-directional temporal prediction without introducing 
relevant overhead. Therefore, the complete H.263 standard pro-vddes three frame coding 
modes: I. P and PB. Further details about H.263 staridard can be found in [l]-[2]. 


2.1 Motion estimation 

In the H.263 main profile, one motion vector per MB is eissigned, The motion estimation 
strategy is not specified, but the motion vectors range is fixed to [—16, +15.5] pixels in 
a picture for both components. This range can be extended to [—31.5, +31.5] only when 
the UMV and APM options are jointly used. Every macroblock vector (MV) is then 
differential encoded with a proper VLC. The prediction vector (PMV), to be added to 
the vector differences, is obtained as the median value of the three surrounding MB vectors 
(A<rVl,MV2,MV3), according to: 

PMV = median{MVl, M72, MVS) ' 

Hence, every PMV component is the median value of the three candidate predictors for 
this component (see case A of Pig. 3). 

In the special cases at the borders of the current GOB or picture the following decision 
rules are applied in increasing order: 

• the candidate predictor MVl is set to zero if the corresponding macroblock is outside 
the picture (at the left side, as in case B of Pig, 3); 

• the candidate predictors MV2 and MVS are set to MVl if the corresponding mac- 
roblocks are outside the picture (at the top) or outside the GOB (at the top, as in 
case D of Fig. 3): 

• the candidate predictor MVS is set to zero if the corresponding macroblock is outside 
the picture (at the right side, as in case C of Fig. 3); 
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Figure 3; Motion vector prediction for a macroblock. 


• whea the correspoading macroblock was coded in INTRA mode (if not in PB-Prames 
mode) or was not coded at all, the candidate predictor is set to zero. 

The motion estimation plaj-^ a fundamental role in the encoding process, since the quality 
of temporally predicted pictures strongly depends on the motion vectors accuracy and 
reliability. The temporal prediction block diagram is shown in Fig. 4. 


3 THE ADVANCED PREDICTION MODE 

This Section describes the optional Advanced Prediction mode of H.263. Such option 
includes overlapped block motion compensation and the possibility of four motion vectors 
per macrobloclv. The capabihty of this mode is signalled by external means (for axample 
Puecommendation H.245 [5]), 

In APM motion vectors are allowed to cross picture boundaries as is the case in UMV 
option. The extended motion vector range feature of UMV is not automatically included 
in the APM, and only it is active if the UMV option is selected. If APM is used in 
combination with the PBF option, overlapped motion compensation is only used for 
prediction of the P-pictures, not for the B-pictures. 

In APM. one motion vector per MB can be assigned, if the temporal error prediction is 
lower when four block vectors are used {which are then encoded and transmitted) instead 
of onlj'- one macroblock vector (which is then encoded and transmitted), according to a 
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Figure 4: Temporal prediction block diagram. 

proper criterion. The one or four vectors decision is indicated bj^ the "MCBPC codeword 
for each macroblock. If only one motion vector is transmitted for a certain macroblock, 
this is defined as four vectors with the same value. If MGBPC indicates that four motioii 
vectors ai^e transmitted for the current macroblock, the information for the first motion- 
vector is transmitted as the codeword ''^^\^D'' and the information for the three additional 
motion vectors is transmitted as the codewords '•MVD2-4'\ 

The vectors are obtained by adding predictors to the vector differences indicated by MVD 
and M\'T)2-4 in a similar way as when only one motion vector per macroblock is present. 
Again the predictors axe calculated, by median filtering, separately for the horizontal 
and vertical components. However, the candidate predictors MVl, MV2 and MVS are 
redefined as indicated in Pig. 5. 

The numbering of the motion vectors is equivalent to the numbering of the four luminance 
blocl3 as given in Fig. 2. Motion vector for both chrominance bloclcs is derived by calcu- 
lating the sum of the four luminaince vectors and dividing this sum by 8; the component 
values of the resulting sixteenth pixel resolution vectors are modified towards the nearest 
half pixel position 


3.1 Overlapped motion compensation 

The APM specifies also the luminance and chrominance motion compensation technique. 
Each luminance pixel in a bloclc is a weighted average of three prediction values. In order 
to obtain the three prediction values, two motion vectors are used besides of the motion 
vector of the current luminance block (MVc): the motion vector of the bloclc at the left 
or right side of the cuixent luminance block (respectively MVl or MVr), and the motion 
vector of the block above or below the current luminance block (respectively A^Va or 
MVb), Remote motion vectors from other GOBs are used in the same way as remote 
motion vectors inside the current GOB. 
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Figure 5: Redefinition of the candidate predictors M^>^1, MV2 and MVS for eadi of the 
luminance blocks in a macroblock. 

For each pbcel, the remote motion vectors of the blocks at the two nearest block borders are 
used. This means that for the upper half of the block the motion vector corresponding to 
the blodv above the cun^ent block is used, while for the lower half of the block the motion 
vector corresponding to the blodc below the current block is used. Similarly, for the left 
half of the bloclc the motion vector corresponding to the block at the left side of the current 
block is used, while for the right half of the block the motion vector corresponding to the 
bloclc at the right side of the current block is used, as shown in Fig. 6. 

In otir proposal we still continue to use overlapped block motion compensation 
as it is specified in the H-263 APM. Our invention is focused only on the 
motion estimation part of APM. 


4 THE NEW MOTION ESTIMATION 

For estimating the true-motion £i-om a sequence of pictures we departed from the high 
quality 3-Dimengional Recursive Search block matching algorithm, presented in [7] and 
[Sj. Unlike the more expensive full-search block matchers that estimate all the 
possible displacements within a search area, this algorithm only investigates 
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Fig^ure 6: Motion vectors used in tte overlapped block motion compensation. 

a veiT limited number of possible displacements. By carefully choosing the 
candidate vectors, a high performance can be achieved, approaching almost 
true motion, with a low complexity design. Its attractiveness was earlier proven in 
an IC for SD-TV consumer applications [10]. 

4.1 Basic concepts 

In_block-matching motion esftimation algorithms, a displacement vector, or motion vector 
d{bc.. t). is assigned to the centre = (a:,, y^y of a pixels block B(Jbc) in the current image 
7(5, <), where fr means transpose. The assignment is done if matches a similar block 
T^'ithin a search area SA{b;), also centred at 1^.. but in the pre\'ious image I{x, t-T), with 
T = nTg (n integer) representing the time interval between two subsequent decoded 
images.^The similar block has a centre which is shifted with respect to $c over the motion 
vector d{bc, t). To_^fixid d(be, t), a mnnber of candidate vectors C are evaluated applying an 
error measure e(C, be, t) to quantify block similarity'. Figure 7 illustrates the procedure. 
The pixels in the block have the follo^^ing positions: 

(x, - X/2 <X<Xc-r X/2) 
iyc-y/2<y<y, + Y/2) 

Tvith A' and Y the block width and block height respectively, and x = (x, -y)^ the spatial 
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Current block 



^ i X ■ H-position 
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Figure 7: Ulustration of block-matcliing. 

position in the image. 

The candidate vectors are selected from the candidate set CS{be>t), which is determined 
by: 


CSib„t) = { 


(1) 


T\-here the update vectors Ui{bc) and U^ibe) are randomly selected from an update set US, 
defined as: 

with the integer updates USi(be) stated by: 

^0 
1 
^0 
2 

.3 



(2) 


The fractional updates USf(bc), nece5sar>' to realise half-pixel accuracy, are defined by 


Either Vi{bc) or U2{bc) equals the zero update. 

From these Equations it cem be concluded that the candidate set consists of spatial and 
spatio-temporal prediction vectors from a 3-D neighborhood and an updated prediction 
vector. This implicitly assumes spatial and/or temporal consistency. The updating pro- 
cess involves updates added to either of the spatial predictions. Figure 8 shows where the 
spatial and spatio-temporal prediction vectors are located relative to the current blodc. 
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Figure 8: Positions of the prediction vectors relative to the current block. 

The displacement vector t), resulting from the block-matching process, is a candidate 
vector C which >delds the minimum value of the error function e(C, t): 

d{bc, t)^{Ce CS\e{C, bc,t)< e(V, be, t)) (4) 

The error function is a cost function of the luminance values, I{x,t), and those of the 
shifted block from the previous field, I(x -C,t-, T), summed over the block 5(6^). A 
common choice, which we also use, is the Sum of the Absolute Differences (SAD). The 
error function is defined by: 


= E \ns.t)-I{x^C,t-T)\ 


(5) 


The vector range is ±31.5 pLxel in both directions, as in the joint application of UMV 
and APM. 


4.2 Iterative estimation 

To further improve the motion field consistency, the estimation process is iterated several 
times, using the motion A^ectors calculated in the pre\ious iteration to initialize the current 
iteration, as temporal candidate vectors. 
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During the first and the third iterations, both previous and current images are scanned 
from top to bottom and from left to right, that is in the "normal video" scanning direction. 
On the contrary, the second and fourth iteration are executed with both the images 
scanned in "anti-video" direction, from bottom to top and from right to left. 

The candidate vectors are selected from the new candidate set CS'ibc t), defined by: 



where 


T) 


for z == 1. at every first iteration on an image pair, ajid 
for 2 > 2, with i indicating the current iteration nimiber. 

Furthermore, the first and second iteration are applied on pre-fiitered copies of the two 
decoded images and without sub-pixel accuracy, while the third and fourth iteration are 
done directly on the original (decoded) images and produce a half-pixel accurate raotion 
vectors. 

The pre-filtering consists of a horizontal average over four pbcels: 

1 ^ 

Ipfi^, 0 ^ 4 E ^i^ddlLi -f k, y, t) (6) 


where I{x, y, t) is the luminance value of the current pixel, Ipf{x,y, t) is the correspondent 
filtered version and djAi is the integer division. Two are the main advantages of pre- 
filtering prior to motion estimation; the first is an increase of the vector field coherency, 
due to the "noise'' reduction effect of the filtering itself, the second is a decrease of the 
computational complexity, since the sul>-pixel accuracy is not necessary in this case. 

The computational complexitj^ of the motion estimation is practically independent on the 
actual (variable) frame rate, for n<A. In fact, the number of iterations per images pair 
varies according to the time interval betu-een tv^^o decoded pictures, as shown in Table 1. 
^Mien n > 5, we use the same iterations ajs with n = 4. 

4,3 Block subsampling, block overlapping and pixel siibsampling 

It is possible to decrease the computational price of the motion estimation by haMng 
the number of block vectors calculated, that is by using block subsampling [7, S]. The 
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Table 1: Relation between iterations number and time interval. 

subsannpled block grid is arranged in a quincunx pattern. If <C = d(b^, t) is a, missing 
vector, it can be calculated from the horizontally neighboring available ones d^, according 
to the following formula 

djr^ = median{du dr, dav) ' (7) 

where 

I ^d.^c-i^Xt) 

and 

The median interpolation acts separately on the horizontal and vertical components of 
the motion vectors. From one iteration to the following we change the subsampling grid 
in order to refine the vectors that were interpolated in the previous iteration. 

The matching error is calculated on bloclcs of sizes 2A' and 2Y , but the best vector 
is assigned to smaller bloclcs with dimensions X and Y. This feature is called block 
overlapping, because the larger 2.Y • 2Y bloclc overlaps the final A' * Y block in horizontal 
and vertical direction. It contributes to improve the coherence and reliability of the 
motion vector field. 

Finally, since the calciilational effort required for a block matcher is almost -linear with 
the pixel densitj' in a block, we also introduce a pixel subsampling factor of four. Hence 
there are 2X • 2Y/4 pixels in a large • 2^ block where the matching error is calculated, 
for everj' iteration. Again, from an iteration to the foUo^^ing, we change also the pixel 
subsampling grid to spread the number of matching pbcels. Fig, 9 shows the calciilation 
and assignment area and the two phases of block and pixel subsampling. 

4,4 Final remarks 

ThLs new block matching motion estimator can calculate the objects true- 
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Figure 9: Block overlapping, block subsampling and pixel subsampiing. 

motion i^nth great accuracy, 3delding a very coherent motion vector field, 
from the spatial and temporal points of view. This means that the VLC 
differential encoding of macroblocl< vectors should achieve lower bitrates in 
comparison with vectors estimated from "classical" full-search block match- 
ers. Furthermore its very low complexity over-compensates the increased 
global processing complexity, due to the introduction of the MVPF stage in 
both the encoding and decoding stages, 

5 MOTION VECTORS POST-FILTERING 

In this Section we uall describe the real innovative part of our proposal, the motion vectors 
post-filtering (M\TF). 

Id practice, we want to use the overlapped block motion compensation, as it 
is actually specified in APM of H.263 standai^d, in both the encoding and 
decoding tenninals, while transmitting and receiving only MB motion vectors 
(to not increase the bitrate). This means that both terminals have to use the 
same MVPF, to re-assign the MB vectors to blocks of 8-8 pixels, as performed 
in the motion estimation part of APM. Fig. 10 shows the temporal prediction block 
diagi^am including the MVPF. 
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Figure 10: The I^'IVPF ia the temporaj prediction block diagram. 

Even if the ^IVPF should not be dependent on the estimation strategy, we strongly 
recommend to use it jointly with the motion estimator described in Section 4, to obtain 
the best performances. Of course there are several solutions to calculate the block vectors, 
for example by a weighted averaging of the adjacent macroblock vectors, anyway we will 
describe in detail only what we consider the best solution, due to the inherent features of 
oui' new motion estimator, the block erosion MVPF. 


5*1 Block erosion 

As reported in the pre\dous Sections, in the H.263 standard the motion information is 
limited to one vector per macrobloclc of X • y = 16 - 16 pbcels, therefore the MVPF 
performs a block erosion to eliminate fixed blodc boundaries from the vector field, by 
re-assigning a new vector to a block of sizes (A72) • (Y/2) = 8 - -S. 

If MVc = d(bc, t) is a, macroblock vector centered in be and its four adjacent macroblock 
vectors are given by: 


MVr^d{bc - 
MVt=d(b, - 
MVb^diPc - 



the four 8 • 8 blocks, numbered as in Fig 2, will be assigned theii' new vectors according 
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to the following: 


MVl^median(ma, MVc, MVa) 
MV2=median{M\'a, MVc, MVr) 
MVZ^median{MVl, Wc, MVh) 
MV4^median(MVr, W c, MVb) 


Figure 11 shows the bloclc erosion of a macroblock vector MVc into four block vectors 
A/V1,MV2, MV3,MV4. 


Figure 11: Block erosion: fi'om one vector per raacroblock to one vector for every block. 


5.2 Standard compatibility 

As far as we know, this solution has not been mentioned in the standard, but it is fuUy 
H.263 compatible. At the start of the multimedia communication the two terminals 
exchange data about their processing standard and non-standard capabilities (see [5] 
for more details). If we assume that, during the communication set-up, both 
terminals declai-e this MVPF capability, they i^nll easUy interface with each 
other. Hence, the video encoder will transmit only MB vectors, while the 
video decoder will post-filter them in order to have a different vector for 
every block. In the temporal inteiTpoIation process both terminals use the 
overlapped block motion compensation, as it is specified in the H.263 APM, 
Thanks to this method, we can achieve the same image quality as if the APM 
was usedj but without increasing the bitrate. 


MVc 


MVa 



MVb 


i 


» ' If at least one termmaJ declares to have not this capability, a flag can be forced in the 
other terminal to switch it off. 


6 EVALUATION 


To simulate the H.263 encoding process we took the Internet pubUc domain C-language 
softx^^are tmn-1.6, written bj' Telenor KicD [6] during the H.263 Test Model TMN 5 
definition phase [9]. Although we briefly present only Tables related to Teeny, a girl 
rotating her head verj' qixicldy (see Fig 12), the observations that we wiU do are generily 
valid also for other video sequences with CIP, QCIF or SQCIF format. 



Figure 12: The Teeny test GIF' video sequence. 

Table 2 shows the behaviour of Teeny, encoded with target bitrate of 256 kbps at GIF 
resolution, with ajid Ts-ithout APM (and UMV jointly, since H.263 standard suggests their 
joint action). The mean picture quantizer QP decreases while decreasing the frame rate, 
indicating that the spatial quality increases necessary at expense'of the frame rate ^. Also 
the luminance Mean Square Error {MSB) between the original image and its decoded 
version is clearly decreasing, indicating not only that the spatial detaU is increasing, but 
also that the temporal prediction is still good,- due to the robustness of the encodinc 
DPCM/DCT loop. 

It is interesting to note that the motion vectors (macroblock information) need from 13 % 
up to 18 % of the total bitrate in the basic H.263 standard, and 19-25 % in the H,263 
with APM and UMV. In tliis last case the mean quantizer is increased and then the image 
sharpness is decreased, since the DCT quantization becomes more coarse to "compensate" 
the extra bit-budget required by the motion vectors block information, while the output 
buffer controller maintains constant the global bitrate. Finally, vnth. APM and UMV 
the motion information is increased of about 40 % in comparison with the basic H.263 
standard. 

QP dosed to 31 means coarse quantization and hence veij' poor spatial resolution; for QP < 20 
the detail is quite satisfactory. 


16 


y».i *.to . uu^^ ^u^i ooo ^pQ/ xx^ / xo *ox x'y 


4 


frame 
rate 

obtained 
bitrate 

motion 
vectors 

mean 
QP 

encoding 
USE 

12.5 Hi 

258 50 kbps 

45.66 kbps 

17.88 

80. S4 

8.33 Hz 

261.12 kbps 

34.00 kbps 

15.53 

70.68 


frame 
rate 

obtained 
bitrate 

motion 
vectors 

mean 
QP 

encoding 
MSB 

12.5 Hz 

258.6 kbps 

64.23 kbps 

20.35 

79.22 

8.33 Hz 

261.15 kbps 

49.13 kbps 

18.51 

72.20 


Table 2: Teeny encoded at a target bitrate of 256 kbps without (top) and with (bottom) 
APM and UMV. 

Thanks to our method, we can use this amount of bits for relaxing th^ DCT 
coefficients quantization instead of encoding the motion vectors information 
related to blocks, so that we achieve higher sharpness pictures than actual 
H.263 standard encoders with APM. without increasing the bitrates. 

On the other hand, if the DCT coeflBcients quantization is not relaxed, we 
can encode and transmit "typical H.263 plus APM quality" pictures, while 
reducing the bitrate because of no block motion information transmission, 
thus increasing the channel efficiency. 

Finally, in our method every block will be assigned its own motion vectors, 
while in the APM of H.263 standard not all the macroblocks will be processed 
as four separate blocks. In other words, in APM is always possible that there 
will remain a consistent number of macroblocks to which a motion vector is 
assigned, while our method always assigns one proper motion vector to every 
block. 


7 CONCLUSIONS 

The invention relates to a low bitrate video coding method fully compatible 
with H.263 standard and comprising a Motion Vectors Post-Filtering (MVPF) 
step. This IVIVPF step assigns a different motion vector to every block compos- 
ing a macroblock, starting from the original motion vector of the macroblock 
itself. In this way the temporal prediction is based on S-S pixels blocks instead 
of 16 ' 16 macroblocks, as actually is done when the negotiable option called 
Ad\^ced Prediction Mode (APM) is used in the H.263 encoder. The video 
decoding terminal has to use the same M\^P step to produce the related 
block vectors. 

Furthermore, since only macroblock vectors are differential encoded (in a vain- 
able length fashion) and transmitted, a considerable bitrate reduction is also 
achieved, in comparison \^nth APM. 
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This metlif>d is not yet H.263 standardized, so it has to be signalled between 
the two terminals, via the H.245 protocol. It can be used at CIP, QCIF and 
SQCIF resolution. 
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CLAIMS (SECTION 8) : 

1. A motion estimation method for a low bitrate video encoder 
or decoder of signals for which a hierarchical bitstream syntax 
is defined including macroblock and block levels, said method 
being based on the use of the so-called advanced prediction 
mode in terms of motion estimation and compensation, wherein 
only one motion vector per macroblock is transmitted from the 
encoding side, while a non-linear operation is used on the 
decoding side for the motion estimation per block and on the 
encoding side for the picture prediction. 

2. A method according to claim 1, wherein said non-linear 
operation is a motion vector post-filtering step. 

3. A method according to anyone of claims 1 and 2, wherein 
said step is performed on transmitted vectors corresponding to 
macroblocks of 16 x 16 picture elements, in order to have a 
different motion vector for each block of 8 x 8 picture 
elements . 

4. A method according to claim 3. wherein said post-filtering 
step performs a block erosion in order to eliminate fixed block 
boundaries . 

5. A method according to anyone of claims 1 to 4 . wherein the 
bit-budget saved by encoding and transmitting only macroblock 
vectors is re-used for a less coarse quantization within the 
encoder. 

6. A motion estimation device for implementing a method 
according to anyone of claims 1 to 5. 

7. An H.263-like video encoder able to make use of the so- 
called prediction mode in terms of motion estimation and 
compensation, including in the motion estimation stage of its 
temporal prediction loop a device according to claim 6. 

8. An H.263-like video decoder including in its temporal 
interpolation stage a device according to claim 6. 
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